Co-Training
Co-training is a powerful semi-supervised learning algorithm that leverages multiple views or feature sets of the data. It works by training separate models on different feature subsets and allowing each model to teach the other, bootstrapping learning from limited labeled data.
Core Concept
The fundamental idea behind co-training is that:
- Different feature views provide complementary information about the data
- A classifier that is confident on one view can provide useful labels for the other view
- Through iterative mutual teaching, both classifiers can improve beyond what either could learn alone
Requirements for Effective Co-Training
- Feature Split Compatibility: The feature space can be naturally or artificially divided into "views"
- View Sufficiency: Each view should be sufficient to learn the target concept if given enough labeled data
- View Independence: The views should be conditionally independent given the class
- Initial Accuracy: Base classifiers should achieve better than random performance on the labeled data
Algorithm Steps
- Split features into two (or more) separate views X₁ and X₂
- Train initial models M₁ and M₂ on labeled data using their respective views
- Predict on unlabeled data using both models independently
- Select confident predictions from each model (those with highest confidence)
- Teach the other model:
- Add M₁'s confident predictions to M₂'s training set
- Add M₂'s confident predictions to M₁'s training set
- Retrain both models on their expanded training sets
- Iterate until convergence or a stopping criterion is met
Mathematical Formulation
Let:
- be the labeled dataset
- be the unlabeled dataset
- and be the two views of data point
- and be classifiers for views 1 and 2
The co-training process:
- Train on
- Train on
- For each iteration:
- Let
- Let
- Update using
- Update using
Example: Document Classification
Consider classifying web pages as "course" or "non-course" pages:
Natural Views:
- View 1: Text content of the page
- View 2: Text of links pointing to the page
Algorithm Application:
- Train a text classifier on page content from labeled pages
- Train a link text classifier on anchor text from labeled pages
- Both classifiers predict on unlabeled pages
- Text classifier confidently labels some pages based on content
- Link classifier learns from these newly labeled examples
- Link classifier confidently labels other pages based on anchor text
- Text classifier learns from these newly labeled examples
- Iterate, expanding labeled dataset for both classifiers
Variations of Co-Training
Multi-View Co-Training
- Extends beyond two views to multiple complementary views
- Each view contributes to teaching other views
Co-EM (Co-Expectation Maximization)
- Combines co-training with Expectation-Maximization
- Uses probabilistic models for each view
- Soft assignment of labels based on probabilities
Tri-Training
- Uses three classifiers instead of two
- A prediction becomes a label when two classifiers agree
- Eliminates need for confidence estimation
Democratic Co-Learning
- Ensemble of multiple classifiers
- Majority voting determines which examples to add to training set
- Helps mitigate individual classifier biases
Advantages of Co-Training
- Leverages view-specific information: Utilizes complementary aspects of the data
- Robustness: Less prone to error propagation than single-model self-training
- Theoretical guarantees: Under certain conditions, can learn effectively with few labeled examples
- Sample complexity: Can reduce the number of labeled examples needed exponentially
- Ensemble effect: Final combined classifier often outperforms individual view classifiers
Limitations
- View requirement: Requires multiple naturally occurring or created views
- View independence: Performance degrades if views are highly correlated
- Error propagation: Can still reinforce errors if both views make similar mistakes
- Implementation complexity: More complex to implement than self-training
- Hyperparameter sensitivity: Performance depends on confidence thresholds and other parameters
Creating Artificial Views
When natural views aren't available:
- Random Feature Splitting: Randomly divide features into two sets
- Clustering-Based: Use clustering to create feature groupings
- PCA-Based: Split features based on principal components
- Data Augmentation: Create additional views through different transformations
- Different Classifiers: Use the same features but different learning algorithms
Applications
- Web Page Classification: Using page content and link structure
- Email Categorization: Using email body and headers as different views
- Image Recognition: Using different aspects like color and texture
- Multimodal Learning: Using text and images as complementary views
- Bioinformatics: Using different types of biological data
Co-training remains one of the foundational algorithms in semi-supervised learning, particularly valuable when data naturally comes with multiple views or when feature splits can be engineered to provide complementary information.